Value-added modeling (also known as value-added analysis and value-added assessment) is a method of teacher evaluation that measures the teacher's contribution in a given year by comparing current school year test scores of their students to the scores of those same students in the previous school year, as well as to the scores of other students in the same grade. In this manner, value-added modeling seeks to isolate the contribution that each teacher makes in a given year, which can be compared to the performance measures of other teachers. Critics say that the use of tests to evaluate individual teachers has not been scientifically validated, and much of the results are due to chance or conditions beyond the teacher's control, such as outside tutoring.[1]
Contents |
Statisticians use a student's past test scores to predict the student's future test scores, on the assumption that students usually score approximately as well each year as they have in past years. The student's actual score is then compared to the predicted score. The difference between the predicted and actual scores, if any, is assumed to be due to the teacher and the school, rather than to the student's natural ability or socioeconomic circumstances.
In this way, value-added modeling isolates the teacher's contributions from factors outside the teacher's control that are known to strongly affect student test performance, including the student's general intelligence, poverty, and parental involvement.
By aggregating all of these individual results, statisticians can determine how much a given teacher typically improves student achievement, compared to how much the typical teacher would have improved student achievement.
As of 2010[update], a few school districts across the United States had adopted the system, including the Chicago Public Schools, New York City Department of Education and District of Columbia Public Schools. The rankings have been used to decide on issues of teacher retention and the awarding of bonuses, as well as a tool for identifying those teachers who would benefit most from teacher training.[1] Under Race to the Top and other programs advocating for better methods of evaluating teacher performance, districts have looked to value-added modeling as a supplement to observing teachers in classrooms.[1]
Louisiana legislator Frank Hoffmann introduced a bill to authorize the use of value-added modeling techniques in the state's public schools schools as a means to reward strong teachers and to identify successful pedagogical methods, as well as providing a means to provide additional professional development for those teachers identified as weaker than others. Despite opposition from the Louisiana Federation of Teachers, the bill passed the Louisiana State Senate on May 26, 2010, and was immediately signed into law by Governor Bobby Jindal.[2]
Experts do not recommend using value-added modeling as the sole determinant of any decision.[3] Instead, they recommend using it as a significant factor in a multifaceted evaluation program.[4]
As a norm-referenced evaluation system, the teacher's performance is compared to the results seen in other teachers in the chosen comparison group. It is therefore possible to use this model to infer that a teacher is better, worse, or the same as the typical teacher, but it is not possible to use this model to determine whether a given level of performance is desirable.
Because each student's expected score is largely derived from the student's actual scores in previous years, it difficult to use this model to evaluate teachers of Kindergarten and first grade. Some research limits the model to teachers of third grade and above.
Schools may not be able to obtain new students' prior scores from the students' former schools, or the scores may not be useful because of the non-comparability of some tests. A school with high levels of student turnover may have difficulty in collecting sufficient data to apply this model. When students change schools in the middle of the year, their progress during the year is not solely attributable to their final teachers.
Value-added scores are more sensitive to teacher effects for mathematics than for language.[3] This may be due to widespread use of poorly constructed tests for reading and language skills, or it may be because teachers ultimately have less influence over language development.[3] Students learn language skills from many sources, especially their families, while they learn math skills primarily in school.
There is some variation in scores from year to year and from class to class. This variation is similar to performance measures in other fields, such as Major League Baseball and thus may reflect real, natural variations in the teacher's performance.[3] Because of this variation, scores are most accurate if they are derived from a large number of students (typically 50 or more). As a result, it is difficult to use this model to evaluate first-year teachers, especially in elementary school, as they may have only taught 20 students. A ranking based on a single classroom is likely to classify the teacher correctly about 65% of the time. This number rises to 88% if ten years' data are available.[5] Additionally, because the confidence interval is wide, the method is most reliable when identifying teachers who are consistently in the top or bottom 10%, rather than trying to draw fine distinctions between teachers that produce more or less typical achievements, such as attempting to determine whether a teacher should be rated as being slightly above or slightly below the median.[5]
The idea of judging the effectiveness of teachers based on the learning gains of students was first introduced into the research literature in 1971 by Eric Hanushek, an economist currently at Stanford University.[6] It was subsequently analyzed by Richard Murnane of Harvard University among others.[7] The approach has been used in a variety of different analyses to assess the variation in teacher effectiveness within schools, and the estimation has shown large and consistent differences among teachers in the learning pace of their students.[8]
Statistician William Sanders, a senior research manager at SAS introduced the concept to school operations when he developed value-added models for school districts in North Carolina and Tennessee. First created as a teacher evaluation tool for school programs in Tennessee in the 1990s, the use of the technique expanded with the passage of the No Child Left Behind legislation in 2002. Based on his experience and research, Sanders argued that "if you use rigorous, robust methods and surround them with safeguards, you can reliably distinguish highly effective teachers from average teachers and from ineffective teachers."[1]
A 2003 study by the RAND Corporation prepared for the Carnegie Corporation of New York, said that value-added modeling "holds out the promise of separating the effects of teachers and schools from the powerful effects of such noneducational factors as family background" and that studies had shown that there was a wide variance in teacher scores when using such models, which could make value-added modeling an effective tool for evaluating and rewarding teacher performance if the variability could be substantiated as linked to the performance of individual teachers.[9]
The Los Angeles Times reported on the use of the program in that city's schools, creating a searchable web site that provided the score calculated by the value-added modeling system for 6,000 elementary school teachers in the district. United States Secretary of Education Arne Duncan praised the newspaper's reporting on the teacher scores citing it as a model of increased transparency, though he noted that greater openness must be balanced against concerns regarding "privacy, fairness and respect for teachers".[1] In February, 2011, Derek Briggs and Ben Domingue of the National Education Policy Center (NEPC) released a report reanalyzing the same dataset from the L.A. Unified School District, attempting to replicate the results published in the Times, and they found serious limitations of the previous research, concluding that the "research on which the Los Angeles Times relied for its August 2010 teacher effectiveness reporting was demonstrably inadequate to support the published rankings." [10]
The Bill and Melinda Gates Foundation is sponsoring a multi-year study of value-added modeling with their Measures of Effective Teaching program. Initial results, released in December 2010, indicate that both value-added modeling and student perception of several key teacher traits, such as control of the classroom and challenging students with rigorous work, correctly identify effective teachers.[3] The study about student evaluations was done by Ronald Ferguson. The study also discovered that teachers who teach to the test are much less effective, and have significantly lower value-added modeling scores, than teachers who promote a deep conceptual understanding of the full curriculum.[3] Reanalysis of the MET report’s results conducted by Jesse Rothstein, an economist and professor at University of California, Berkeley, dispute some of these interpretations, however. [11] Rothstein argues that the analyses in the report do not support the conclusions, and that "interpreted correctly... [they] undermine rather than validate value-added-based approaches to teacher evaluation.” [12]
A report issued by the Economic Policy Institute in August 2010 recognized that "American public schools generally do a poor job of systematically developing and evaluating teachers" but expressed concern that using performance on standardized tests as a measuring tool will not lead to better performance. The EPI report recommends that measures of performance based on standardized test scores be one factor among many that should be considered to "provide a more accurate view of what teachers in fact do in the classroom and how that contributes to student learning." The study called value-added modeling a fairer means of comparing teachers that allows for better measures of educational methodologies and overall school performance, but argued that student test scores were not sufficiently reliable as a means of making "high-stakes personnel decisions".[13]
Edward Haertel, who led the Economic Policy Institute research team, wrote that the methodologies being pushed as part of the Race to the Top program placed "too much emphasis on measures of growth in student achievement that have not yet been adequately studied for the purposes of evaluating teachers and principals" and that the techniques of valued-added modeling need to be more thoroughly evaluated and should only be used "in closely studied pilot projects".[1]
Several alternatives for teacher evaluation have been implemented:
Most experts recommend using multiple measures to evaluate teacher effectiveness.[15]